Convergence of Monte Carlo Simulations to Equilibrium 
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We give two direct, elementary proofs that a Monte Carlo simulation converges to equilibrium 
provided that appropriate conditions are satisfied. The first proof requires detailed balance while 
the second is quite general. 



PACS numbers: 05.10.Ln, 75.40.Mg 

Monte Carlo simulations are widely used in statistical 
physics. If the algorithm satisfies the detailed balance 
condition, it is easy to show that the desired distribu- 
tion is a stationary distribution, i.e. if the system is, 
by some means, put into the desired distribution, it will 
subsequently stay in this distribution. It is harder, but 
nonetheless crucial, to also show that, starting from a 
general distribution, the algorithm will converge to the 
desired distribution. Although this can be proved with- 
out too much difficulty for a system with a finite num- 
ber of statesB, this proof is unfamiliar to most physicists. 
The argument in the original paper of Metropolis et al.EI 
makes convergence plausible but is not a proofo. Most 
physics texts on Monte Carlo methods either do not give 
a proof of convergence^ or refer the readertlu to rather ab- 
stract derivations in the mathematics literatures, or rely 
on the Frobenius-Perron theoremEl which is unfamiliar to 
most physicists. 

In this paper we present two proofs of convergence 
which are self contained and use only elementary meth- 
ods. We feel that it is useful to present these derivations 
here because (i) it is not widely known in the physics 
community that it is not difficult to prove convergence, 
at least for systems with a finite number of states, and 
(ii) our proofs are (to the best of our knowledge) different 
from and as simple as existing proofs in the mathematical 
literature. Even the proof of Ref. |l|, which is of compa- 
rable simplicity, is hard to understand physically; this is 
discussed more fully near the end of the paper. 

The first proof relies on the Monte Carlo algorithm 
satisfying the condition of detailed balance. Although 
this is true in essentially all Monte Carlo simulations, it 
is not strictly required for the algorithm to converge to 
the equilibrium distribution. In the second half of this 
paper, we shall also present a more general proof which 
relaxes this condition, and which is very different from 
the proof assuming detailed balance. 

Throughout this paper, we shall assume that the sys- 
tem being considered has a finite number of states. Most 
systems in physics can be approximated as such by dis- 
cretizing any continuous variables sufficiently finely; for 
instance, in molecular simulations, it should be permis- 
sible to limit the phase space for any particle to a suffi- 
ciently large region, and then discretize it in small inter- 
vals. A rigorous proof of convergence for systems with 
an infinite number of states is much more complicated!!!!. 

In the simplest case, the desired distribution that the 



Monte Carlo method seeks to simulate is the Boltzmann 
distribution at some temperature. However, in dealing 
with glassy systems, it is sometimes more efficient to 
simulate a different distribution, as in the multicanonipa.1 
ensembleO, the l/k ensembles, and parallel tempering^! 
The results presented in this paper are valid whenever 
the desired distribution is a stationary distribution of the 
algorithm, and detailed balance — or, more generally, er- 
godicity — is satisfied. This includes the cases mentioned 
above. 

The essential ingredients of the Monte Carlo method 
in statistical physics are the (non-negative) "transition 
rates", wi^ m , defined to be the probability that, given 
the system is in state / at "time" t, then it will be in state 
rn(^ I) at time t + 1. We define time to be incremented 
by one every Monte Carlo move (not sweep) and assume 
initially that all moves are equivalent, so the wi—> m do not 
depend on time. An example would be flipping a single 
spin chosen at random. The important case of sequential 
updating will be discussed later. 

The probability that the system is in state / at time t is 
defined to be Pi (t) . The evolution of these probabilities 
is governed by the "master equation" , 



Pi{t + 1) - Pi(t) = w — i - p i(t) wi- 



(i) 



The first term on the right hand side describes transitions 
into state I from m (which therefore increases Pi and so 
has a plus sign) while the second term describes transi- 
tions out of state I, which decreases P(l). Note that only 
terms with m ^ I contribute. We can also define wi^i 
to be the probability that the system stays in state I, i.e. 
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or equivalently, 
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= 1. 



(2) 



implies that the master equation can be written 

Pi(t + i) = yp m {t)w m ^i, (3) 



where the term m = I is now included. 

A necessary condition for the method to work is that 
the desired distribution, P cq , is stationary, i.e. if Pi(t) = 



1 



P° q for all I then P ( (t + 1) = P° q . This means that the 
right hand side of Eq. ([[]) must vanish for P — P oq . The 
condition of detailed balance consists of the assumption 
that each term on the right hand side of Eq. (Q) separately 
vanishes for P = P cq , i.e. 



P? q w m ^. 



(4) 



In the first part of this paper, we shall assume that 
Eq. (H) is satisfied. 

We start with the following quantity, which is a mea- 
sure of the deviation from equilibrium, 
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evaluated at time t, where the last expression follows 
because P and P eq are normalized. 

At time t + 1 we indicate (for compactness of notation) 
the probabilities by P[ and the corresponding value of G 
by G". We will show that G monotonically decreases, i.e. 



AG = G' - G < 0, 



(G) 



where the equality only holds if G and G' both vanish, 
so the system is in equilibrium. This shows that the 
system will eventually approach arbitrarily close to the 
equilibrium distribution. 

Using Eqs. (||) and (@), AG can be written as 
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(7) 



In the first term on the right hand side of Eq. (^) we use 
the detailed balance condition, Eq. (^), to replace w m ^i 
by wi^mP^/P^ 1 , and in the second term we can use 
Eq. (||) to insert a factor of J2 m w i->m (and interchange 
the indices I and m). This gives 
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(8) 



Applying the detailed balance relation again and incor- 
porating a factor of J^n w i->n, the last term in the above 
equation can be written as 



wi^ mWl ^ n Pf q (||!t) 



(9) 



Taking the half the sum of this and the same expression 
with m replaced by n, we finally get 
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(10) 



where terms with m = I and n = I are included. 



Eq. ( |lpp is the main result of this part of the paper. 
It shows that AG is definitely negative unless, for every 
state I, all states which can be reached from I in a single 
move — equivalently, with detailed balance, all states 
from which I can be reached in a single move — have 
probabilities proportional to the equilibrium probabili- 
ties. The most natural scenario is that all states sat- 
isfy this with the same proportionality constant (which 
must be unity) i.e. the system is in equilibrium. How- 
ever, AG also vanishes if P m /P^ q assumes different val- 
ues for states which have no common one-step descen- 
dants. Hence, to achieve full equilibrium, the algorithm 
must also be ergodic, i. e. starting from a given state, af- 
ter a sufficiently long time there is non-zero probability 
for the system to be in any state. The condition of ergod- 
icity is sufficient to ensure that even if AG is accidentally 
zero at some time step, it must decrease later, since any 
two states must have common descendants after several 
time steps. If in addition u;;^; ^ for all /, which is 
usually true, the one-step descendants of a set of states 
must include the set itself, so that it is not possible to 
break up all the states of the system into subsets with no 
common one-step descendants across two subsets. Thus 
if this condition is satisfied, AG cannot be zero (without 
the system being in equilibrium) at any time step. 

We will distinguish between a process which is ergodic 
and one which satisfies the lesser condition of being " irre- 
ducible" . In the latter, the system will eventually sample 
all states starting from a given initial stateO, but, at a 
fixed later time, the probability for some of the states is 
zero. A familiar example which is irreducible but not er- 
godic is the Ising model at infinite temperature simulated 
using Metropolis updating, for which the probability to 
flip is unity in this limit. Clearly after an odd time, the 
number of flipped spins must be odd and vice-versa. For 
such a non-ergodic system, it is possible for P m /P^ and 
P n /P° q to be different for states which have no common 
descendant at any fixed later time (for the Ising model 
example given here, states which differ by an odd number 
of spin flips). 

Note that Eq. ( |Io|) does not give an estimate for how 
fast equilibrium is reached. 

For random updating considered so far the probability 
of making a transition is the same for every move, i.e. 
writing Eq. (Q) as 



Pl(t + l) = ^2T lm P m (t), 



(11) 



then r, the transition matrix (related to w by r; TO = 
w m —>i), is the same for each "time" t. However, for 
sequential updating, the transition matrix depends on 
which site is being updated, so, for a complete sweep, we 
have 



r = p(l)p(2) . . .p(JV) 



(12) 



where rW is the transition matrix for updating spin i. 
Although the T^ 1 ' individually satisfy the detailed bal- 
ance condition, the transition matrix for the whole sweep, 
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r, does notp because the probability of the reverse tran- 
sition, m — > I say, for a whole sweep, is related to the 
probability of transition I — > m in the desired way only 
if the spins are updated in the reverse order. Despite 
overall lack of detailed balance, convergence to the equi- 
librium distribution is still obtained for sequential up- 
dating because G decreases at each step, as long as each 
of the transition probabilities, rW, satisfies the detailed 
balance condition. 

More generally, detailed balance is not necessary, see 
Rcf. ^ and references therein. In the rest of this paper, we 
give a derivation of the necessary and sufficient conditions 
for convergence to P cq : 

(i) the algorithm has P eq as a stationary distribution 

(ii) for any pair of states there exists some 
and some state kij such that at the T^'th time step 
the probabilities to have reached k from i and j are 
both non-zero. 

We shall see that (ii) (with (i)) implies ergodicity. 

We first prove that conditions (i) and (ii) are sufficient. 

Note that Eq. ([!]) is linear in the probabilities p , so that 

if we define 8 Pi = Pi — P ; cq , then (with condition (i)) 5 Pi 

also satisfies Eq. ([!]), with J^i $Pi — fl- 
it is convenient to use a measure of the deviation from 

equilibrium that is different from Eq. (j^): 

L = J2\6Pi\- (13) 
i 

Like G, the quantity L is positive unless the distribution 
has converged to P eq . We denote by Wi^, m (t) the prob- 
ability that the system, starting out in state 1, reaches 
state m after t time steps. This is obtained by 'iterating' 
the transition rates {w}. Then 

£ (<) = Z)lZ)Wi-m(W(0)l ^^^i-mWI^WI- 

ml m,l 

(14) 

Using the result 5^ m Wi_» m (i) = 1, we see that L{t) < 
L(0). We now compare the two sides of the inequality in 
Eq. (|l4|) , by choosing a specific value of m and carrying 
out the /-summation. If all states I for which W;_> m ^ 
have the same sign for <5P(0), it is clear that the l- 
summation on both sides are equal. On the other hand, 
if some of these states have 5P(0) > and others have 
<5P(0) < 0, the Z-summation on the left hand side has 
both positive and negative terms, and must be less than 
the corresponding sum on the right hand side. Thus so 
long as there is at least one state m which receives 'con- 
tributions' from two states with opposite <5P(0), i.e. 
<5Pi(0) and SPj (0) have opposite signs and Wj_> m ^ and 
Wj^ m ^ 0, we see that L(t) < L(0). Condition (ii) en- 
sures that for any this is the case for t — Ty. Thus 
L(t) stays constant until t = miapy], where the mini- 
mum is taken over all (ij) for which 8 Pi and 8Pj have 



the opposite sign, and then decreases at that time step. 
The time t can be reinitialized to zero at this point, and 
the whole argument repeated again. Note that nothing 
in the argument requires the transition rates wi—> m to be 
time independent, so long as conditions (i) and (ii) are al- 
ways satisfied. Also, as with the first approach above, no 
estimate has been obtained for how fast L(t) approaches 
zero. 

It may seem surprising that L(t ) need not decrease at 
every time step, whereas Eq. ( JX0| ) shows that G must. 
If we start out with 8P positive and negative on two 
states that are well separated from each other (in the 
sense that many time steps are required before the two 
states have common descendants), it is clear that L{t) 
does not change at first. However, G decreases because 
the positive and negative SP's both get 'smeared out' 
over several states. Thus different measures can depict 
the approach to equilibrium at different rates. Since any 
physical observable O is given by (O) = JZjOji^, the 
error (SO) < J2i \Oi8Pi\ < Lmax;[|0;|], so that L is a 
conservative indicator of how observables approach equi- 
librium. 

We have proved the sufficiency of conditions (i) and 
(ii); the necessity can be easily demonstrated. The ne- 
cessity of (i) is obvious. If (ii) is violated, starting with 
an initial condition 5Pj(0) = — SPj(0) and all other SPi's 
equal to zero, the positive and negative regions for SP 
stay separate for all time and cannot neutralize each 
other. 

Condition (ii) (with (i)) is equivalent to the (seemingly 
stronger) condition of ergodicity. If the system were not 
ergodic, then starting with p(0) = Su would lead to a 
P(t) with Pj(t) — for some j, since all states would 
not be accessible from the state i at time t. Therefore 
P(t) could not have converged to P cq . Since we have 
seen that (i) and (ii) imply convergence, they must imply 
ergodicity. 

The proof of convergence in Ref. [l] is comparable in 
simplicity to the proof given above. However, it applies 
the transpose T T of the transition matrix to the initial 
state of the system (represented as a column vector) , and 
shows that under iteration the initial state evolves to a 
column vector whose entries are all identical. Although 
it is possibletl to deduce the desired properties of T from 
this, there is no physical interpretation of evolution un- 
der r T , which does not even conserve probability. In con- 
trast, the proof given above shows how fluctuations from 
the desired distribution are "mixed" by time evolution, 
with positive fluctuations annihilating negative ones. It 
is also physically clear why ergodicity is essential for this 
annihilation process to proceed to completion: positive 
and negative fluctuations are then assured of encounter- 
ing each other under time evolution. 

Although we have considered the conditions for con- 
vergence to P cq , in practice Monte Carlo simulations are 
not carried out using such an ensemble average, but by 
simulating a single system and taking a time average of 
many measurements. If we restrict ourselves to the case 
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when the transition matrix T is timejpdependent, this 
defines a homogeneous Markov chainti It can then be 
shown that if the Markov chain is irreducible and has P oq 
as a stationary distribution, the time average {P(t))t con- 
verges to P cq . We indicate how to prove this result here: 
since L(t) < L(0) in general, none of thp-jeigenvalues of 
r can havc_modulus greater than unitjJ13. Irreducibil- 
ity ensuresEj that the eigenvalue 1 has a unique eigen- 
vector, P cq . The time averaging removes contributions 
from oscillatory eigenvalues of T, of the form e z6 (6 ^ 0). 
By comparison, the stronger condition of ergodicity (ii) 
(with (i)) is enough to rule out oscillatory eigenvalues, 
since no time averaging is needed for convergence to P eq . 
It is not clear how this proof that irreducibility is suffi- 
cient would generalize to time dependent transition rates, 
since the evolution of P is then not directly related to the 
eigenvalues of T(t). 

We thank M. E. J. Newman for useful correspondence. 
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